Skip to content

[PLAT-440] roll up routing in metrics view#9180

Open
pjain1 wants to merge 29 commits intomainfrom
rollup_mv
Open

[PLAT-440] roll up routing in metrics view#9180
pjain1 wants to merge 29 commits intomainfrom
rollup_mv

Conversation

@pjain1
Copy link
Copy Markdown
Member

@pjain1 pjain1 commented Apr 3, 2026

  • Add rollup table config to metrics view proto and YAML parser
  • Implement query routing: eligible rollups are selected based on grain derivability, dimension/measure coverage, timezone match, time range alignment, and time coverage
  • Prefer coarsest grain among eligible rollups; break ties by smallest data range
  • For no-time-range queries ("all data"), verify the rollup covers the base table's full range rather than skipping coverage checks

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@pjain1 pjain1 closed this Apr 3, 2026
@pjain1 pjain1 reopened this Apr 3, 2026
@pjain1 pjain1 changed the title roll up routing in metrics view [PLAT-440] roll up routing in metrics view Apr 5, 2026
@pjain1 pjain1 requested a review from begelundmuller April 6, 2026 05:32
Comment thread runtime/metricsview/executor/watermark_cache.go Outdated
Comment thread proto/rill/runtime/v1/resources.proto Outdated
Comment thread proto/rill/runtime/v1/resources.proto Outdated
Comment thread proto/rill/runtime/v1/time_grain.proto Outdated
Comment thread runtime/parser/parse_metrics_view.go Outdated
Comment thread runtime/parser/parse_metrics_view.go Outdated
Comment thread runtime/metricsview/executor/executor_validate.go Outdated
Comment thread runtime/metricsview/executor/executor_validate.go Outdated
Comment thread runtime/metricsview/executor/executor_validate.go Outdated
Comment thread runtime/metricsview/executor/watermark_cache.go Outdated
Comment thread runtime/metricsview/executor/watermark_cache.go Outdated
@pjain1 pjain1 requested a review from begelundmuller April 13, 2026 16:00
Comment thread proto/rill/runtime/v1/resources.proto Outdated
Comment thread runtime/metricsview/executor/executor.go Outdated
Comment thread runtime/metricsview/executor/executor_rewrite_rollup.go
Comment thread runtime/metricsview/executor/executor_rewrite_rollup.go
Comment thread runtime/metricsview/executor/executor_rewrite_rollup.go
Comment thread runtime/metricsview/executor/executor_rewrite_rollup.go
Comment thread runtime/metricsview/executor/executor_rewrite_rollup.go Outdated
Comment thread runtime/resolvers/metricsview_timestamps.go Outdated
@pjain1 pjain1 requested a review from begelundmuller April 16, 2026 11:10
Comment on lines +9 to +11
// RewriteQueryForRollupTest exposes rewriteQueryForRollup for integration tests. This is to prevent cyclic dependency error.
func (e *Executor) RewriteQueryForRollupTest(ctx context.Context, qry *metricsview.Query) (string, bool, error) {
spec, err := e.rewriteQueryForRollup(ctx, qry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little weird that "integration" tests need to access an internal function.

It would be nice if they could hit the real Query code path instead. Since the problem is to find out what table it uses, did you consider finding some way of accessing that? Some quick ideas:

  • Tracking metadata about the latest query and returning it with e.g. e.LatestQueryTable()
  • Recording the trace in-memory and accessing the rollup events

// A coarser grain is derivable from a finer grain only if they are on the
// same branch (or one is sub-day/day and the other is on a branch rooted at day).
// For example, month is derivable from day, but not from week.
func GrainDerivableFrom(queryGrain, rollupGrain runtimev1.TimeGrain) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant for external callers of the metricsview package? It seems like an implementation detail only relevant to the rollup executor. If so, consider moving into the executor package as internal functions, or into a separate package with time/rollup utils.

Comment on lines +303 to +307
// time_grain is required
if rollup.TimeGrain == runtimev1.TimeGrain_TIME_GRAIN_UNSPECIFIED {
res.OtherErrs = append(res.OtherErrs, fmt.Errorf("rollup[%d]: time_grain is required", i))
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe also validate that it is greater than or equal to the smallest_time_grain

res.OtherErrs = append(res.OtherErrs, fmt.Errorf("rollup[%d]: dimension %q not found in metrics view", i, dimName))
continue
}
if err := e.validateDimension(ctx, t, d, cols); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easy/possible to refactor validateAllDimensionsAndMeasures so it can be used here? I remember we added it since sequential dimension/measure validation got very slow for metrics views with many dimensions/measures.

Comment on lines +102 to +107
if len(mv.ValidSpec.Rollups) > 0 && mv.ValidSpec.TimeDimension != "" {
var timeDim string
if q.TimeRange != nil {
timeDim = q.TimeRange.TimeDimension
}
tsRes, err := ResolveTimestampResult(ctx, rt, instanceID, q.MetricsViewName, timeDim, q.SecurityClaims, priority)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it really matters, but is this still needed when timeDim != "" && mv.ValidSpec.TimeDimension != timeDim?

"go.opentelemetry.io/otel/trace"
)

const defaultTimestampsCacheTTL = 5 * time.Minute
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make sense to move this to InstanceConfig so it can be changed dynamically if needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants